##  Author: Kiril Boyanov (kirilboyanov [at] gmail.com)
##  LinkedIn: www.linkedin.com/kirilboyanov/
##  Last update: 2023-12-08


In this file, we explore the correlations between happiness and different economic, political, societal, environmental and health-related factors. We do this both for the most recent year in the data and for all available years in the data. Finally, we explore explore how the correlations have evolved throughout time.


Setting things up

Importing relevant packages, defining custom functions, specifying local folders etc.

# Importing relevant packages

# For general data-related tasks
library(plyr)
library(tidyverse)
library(data.table)
library(openxlsx)
library(readxl)
library(arrow)
library(zoo)

# For working with countries
library(countrycode)

# For statistical analysis
library(corrr)

# For data visualization
library(ggplot2)
library(plotly)
library(rjson)


User input

Throughout the analysis, we will be using a common BaseYear (to represent the past state of happiness) and a common ReferenceYear (to represent the most recent state of happiness). To ensure consistency across files, these two years are stored in a TXT file, which is imported below.

Thus, we use the following years as base and reference:

## Base year:  2005
## Reference year:  2022


Importing data

We import data that was already pre-processed in the WHR_data_prep.Rmd notebook and that was subjected to missing data imputation in the Dealing_with_missing_data.Rmd notebook. A preview of the data imported is shown below:



Please note that as we have two different measures of GDP included in the data, we’re dropping the one that is based on current prices so as not to overestimate the importance of GDP.


What correlates with current levels of happiness?

Most important factors

We start our analysis by exploring how much different factors are correlated to countries’ annual happiness scores. We calculate the Pearson correlations for all variables in the data and we sort the factors with the highest absolute correlation on top. A preview of the top 20 strongest correlations is shown below:



The factors with the strongest correlation to happiness in 2022 were health expenditure and GDP per capita as well as government effectiveness and poverty headcount ratio (estimated at 6.85 USD a day). As we can see, several different categories appear on the list of the strongest correlates, with economical and political factors seemingly being the most important. Together, these two categories stand for 70% of the top 20 most strongly correlated factors:



Least important factors

Conversely, among the least important factors (shown below), we find indicators such as land area, proportion of the population aged 65+, military expenditure as % of GDP and CO2 emissions:


Grouping the most unrelated factors into categories, we see that the picture is a lot more mixed, with social, environmental and health-related factors being more likely to be unrelated to happiness:



How do correlations look like across time?